This is the Opportunity Mapping 2.0 Technical Document produced by Phuong Tseng. The intention is to capture changes and developments in the 2019 version.
In 2019, there are 5 domains: education, economic & mobility, housing and neighborhood, conduit, and social capital. The social capital domain is a new domain in 2019.
This year, the education domain added a new indicator called Early Childhood Participation Rate or Pre-K. Another indicator, adult with bachelor’s degree was moved from the education domain to the economic & mobility domain in 2019.
common_fields <- c("fips",
"CountyID.x",
"TOTPOP.x", "county_name.x")
edu_list <-
c(
"math_prof",
"read_prof",
"grad_rate",
"pct_not_frpm",
"z_math_prof",
"z_read_prof",
"z_grad_rate",
"az_pct_not_frpm",
"HD01_VD04",
"HD01_VD03",
"ratio",
"ratio2",
"z_preK"
)
There are a few changes to this domain in 2019. The adult with bachelor’s degree was added to this domain, median household income, and median household value. Other indicators such as the commuting time and entry-level jobs’ measures were changed to TCAC’s measures. A new indicator, school district revenue per capita, was added to capture the extent of municipal hoarding. Due to reliability issues of municipal data, school district boundary was used as a proxy instead.
econ_list <- c(
"total_pop_2017",
"below_200_pov_2017.x",
"moe_below_200_pov_2017.x",
"pct_below_pov_2017",
"moe_pct_below_pov_2017",
"pct_below_200_pov_2017.x",
"pct_assist_2017",
"med_hhincome_2017" ,
"moe_med_hhincome_2017" ,
"employed_pop_20to60_2017",
"pct_employed_20to60_2017",
"home_value_2017" ,
"moe_home_value_2017",
"pct_bachelors_plus_2017",
"above_200_pov_2017",
"pct_above_200_pov_2017",
"tot_hh_2017",
"moe_tot_hh_2017",
"moe_pct_long_commute_2017",
"moe_assist_2017",
"moe_long_commute_pct",
"long_commute_pct",
"low_wage_med_distance" ,
"jobs_lowed" ,
"rural_flag",
"az_pct_assist_2017" ,
"az_pct_employed_20to60_2017",
"z_home_value_2017" ,
"z_pct_bachelors_plus_2017" ,
"az_pct_long_commute_2017",
"z_jobs_lowed" ,
"Econ_Domain",
"z_sdrevpcap",
"sdrev",
"sdrevpcap",
"sd_totpop"
)
The housing and neighborhood opportunity domain has two new environmental indicators pulled from CalEnviroScreen (i.e. pm25, lead).
housing_list <-
c("below_200_pov_2017.y",
"moe_below_200_pov_2017.y",
"pct_below_200_pov_2017.y",
"pm25",
"pct_pm25",
"toxRelease",
"pct_toxRelease",
"lead_pctl",
"pct_lead_pctl" ,
"Grocery",
"z_Grocery" ,
"az_Grocery",
"P_INSURED" ,
"az_insurance" ,
"H_Crime",
"pct_parks",
"az_pct_below_200_pov_2017",
"az_pct_below_200_pov_20172",
"az_pct_pm25",
"az_pct_toxRelease",
"az_pct_lead_pctl" ,
"Housing_Env_Domain",
"test_azcrime" ,
"azhealthcare" ,
"zparks"
)
The Conduit domain has two indicators: median broadband download speed and percentage of single-parent households.
conduit_list <-
c(
"pct_singleparent_hh_2017.y",
"moe_pct_singleparent_hh_2017.y",
"az_pct_singleparent_hh_2017",
"TOTPOP.y",
"Median_bb",
"z_broadband",
"z_broadband2",
"Conduit"
)
It is the decision of the analyst to decide whether it makes sense to calculate the index first or after the filtering process. In this case, I decided to calculate the region index of these tracts first and filter the tracts in later steps because it is important to display the scores of these tracts next to its opportunity category for comparison purposes. Our previous analyses show that some tracts may have high index values with high percentage of single-parent households and concentrated poverty.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.757397 -0.130434 -0.006168 0.004954 0.119239 1.180872
Our filters or filtering process consists of two conditions: 1) Poverty (below 200 FPL) >= 30% and Single-parent family >= 30%, OR 2) High Divergence with population of Black and Latinx > 50% and poverty (below 200 FPL) >= 30%. Steps 1 - 3 deals with the first condition while steps 4 - 6 handles the second condition.
returns 471 records with 8 NAs
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.0000 -1.0000 0.0000 -0.2983 0.0000 0.0000
returns 418 records with 3 NAs
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.0000 -1.0000 0.0000 -0.2647 0.0000 0.0000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.0000 0.0000 0.0000 -0.1849 0.0000 0.0000
## [1] -292
## [1] -201
## [1] -201
## [1] -418
## [1] -171
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.0000 -1.0000 0.0000 -0.2647 0.0000 0.0000
High Divergence with population of Black and Latinx > 50% and poverty (below 200 FPL) >= 30% OR Poverty (below 200 FPL) >= 30% and Single-parent family >= 30%
## [1] -320
Here, I take a slightly different approach with the categorization method. Instead of breaking each category into 25%, I break it down by 20% per category, which means each category will have the same number of records. This is because all of these records are categorized only by its index value rather than filters.
These are records with NAs or missing values
1. fips 06081984300 has NaN in pct_pov_below_200 and pct_singleparent_hh
2. fips 06081984300 (Mod) changed to NAs
3. fips 06095253000 has NaN in pct_pov_below_200 and pct_singleparent_hh
4. fips 06095253000 (Highest) changed to NAs
5. fips 06095980000 has NaN in pct_pov_below_200 and pct_singleparent_hh
6. fips 06095980000 (High) changed to NAs
## corrplot 0.84 loaded
## Parsed with column specification:
## cols(
## .default = col_double()
## )
## See spec(...) for full column specifications.
Data Source: ACS Census data 2010-2014
Description: To analyze the distribution of racial and ethnic composition by opportunity categories, user must first join the two datasets then get the aggregate value of the population for each racial group in each opportunity group.
Data Source: American Community Survey (5-year-estimates)
Table: B19013_001 – MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS)
Data Source: ESRI Business Analyst
Spreadsheet: OV_YEAR_Payday
Description: 2017 Measure – Spatially join the payday lending in the bay area shape file to the 2014 census tract shape file with the opportunity categories to obtain the number of businesses per census tract. Then use the count of number of businesses per tract divided by the total count number of payday lending and credit businesses in the Bay Area to obtain the percentage.
2018 Measure – Identify whether the column salevolume in the dataset has the volume of payday loan sales. Aggregate those sales and distribute them to tracts to identify the amount of sales in each neighborhood OR (if it’s possible to) identity where the highest percentage of interests (200-400%) that these payday loans are located and how many of them are in each census tracts.
#load(file="BA_payday_2018.RData")
#proj4string(BA_payday_2018)
Data Source: HUD subsidized housing projects
Spreadsheet: OV_Year_SubHous
Description:
• Data should be gathered through HUD instead of TCAC. Use the file obtained from HUD to create a point shapefile based on the lat and long for each (which is in the table).
• This table has all subsidized housing projects in California; Use geoprocessing to clip the subsidized housing shapefile to Bay Area
• Analysis of Projects and Units should be included in the map based on subsidized units available and the number of subsidized programs in the region.
Data Source: Census Data
Spreadsheet: OV_Year_LowDen
Description: To analyze the density of the census tract and identify areas that are considered low density with 40 or more acres per person
• Calculate the “area” of each tract in acres. Then I divided that by the number of people, and the results are in POP_DEN field. All tracts which had a value of 40 or above were highlighted on the map with a specific symbology
Example:
Step 1: Create a new field, “Acres_per” person for each tract > Calculate Geometry > selecting Area > Coordinate System: Use Coordinate System of the data frame: PCS: NAD 1983 StatePlane California III FIPS 0403 > Units: Acres [US] (ac) > OK
Step 2: Then, create a new field titled, “POP_DEN” in which the value would be “Acres_per” person for each tract divided by the number of people in the tract > select the tracts that have the value of 40 or above
5. Social Capital
This is our newest domain, which has the average distance to a religious institution, registered voters voting rate, and average distance to club membership and etc.